semantic mask
1 Hosting Licensing and Maintenance Plan
The dataset will be available for a minimum of five years, with no plans for removal. We will ensure ongoing maintenance to verify and maintain data accessibility. For what purpose was the dataset created? Was there a specific task in mind? Who created the dataset (e.g., which team, research group) and on behalf of which Who funded the creation of the dataset?
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > California > Los Angeles County > Los Angeles (0.05)
- Law (0.68)
- Information Technology > Security & Privacy (0.68)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > California (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our method employs semantic masks from foundation models to guide the masking and reconstruction process for the masked autoencoder, enabling more focused attention on foreground representations.
FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
Semantic segmentation has witnessed tremendous progress due to the proposal of various advanced network architectures. However, they are extremely hungry for delicate annotations to train, and the acquisition is laborious and unaffordable. Therefore, we present FreeMask in this work, which resorts to synthetic images from generative models to ease the burden of both data collection and annotation procedures. Concretely, we first synthesize abundant training images conditioned on the semantic masks provided by realistic datasets.
SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
Zhang, Xiaoyang, Li, jinjiang, Fan, Guodong, Ju, Yakun, Fan, Linwei, Liu, Jun, Kot, Alex C.
Infrared and visible image fusion (IVIF) aims to combine the thermal radiation information from infrared images with the rich texture details from visible images to enhance perceptual capabilities for downstream visual tasks. However, existing methods often fail to preserve key targets due to a lack of deep semantic understanding of the scene, while the fusion process itself can also introduce artifacts and detail loss, severely compromising both image quality and task performance. To address these issues, this paper proposes SGDFuse, a conditional diffusion model guided by the Segment Anything Model (SAM), to achieve high-fidelity and semantically-aware image fusion. The core of our method is to utilize high-quality semantic masks generated by SAM as explicit priors to guide the optimization of the fusion process via a conditional diffusion model. Specifically, the framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks from SAM jointly with the preliminary fused image as a condition to drive the diffusion model's coarse-to-fine denoising generation. This ensures the fusion process not only has explicit semantic directionality but also guarantees the high fidelity of the final result. Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations, as well as in its adaptability to downstream tasks, providing a powerful solution to the core challenges in image fusion. The code of SGDFuse is available at https://github.com/boshizhang123/SGDFuse.
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
- Asia > China > Shandong Province > Yantai (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (2 more...)
End-to-end Autonomous Vehicle Following System using Monocular Fisheye Camera
Zhang, Jiale, Qian, Yeqiang, Qin, Tong, Jiang, Mingyang, Chen, Siyuan, Yang, Ming
The increase in vehicle ownership has led to increased traffic congestion, more accidents, and higher carbon emissions. Vehicle platooning is a promising solution to address these issues by improving road capacity and reducing fuel consumption. However, existing platooning systems face challenges such as reliance on lane markings and expensive high-precision sensors, which limits their general applicability. To address these issues, we propose a vehicle following framework that expands its capability from restricted scenarios to general scenario applications using only a camera. This is achieved through our newly proposed end-to-end method, which improves overall driving performance. The method incorporates a semantic mask to address causal confusion in multi-frame data fusion. Additionally, we introduce a dynamic sampling mechanism to precisely track the trajectories of preceding vehicles. Extensive closed-loop validation in real-world vehicle experiments demonstrates the system's ability to follow vehicles in various scenarios, outperforming traditional multi-stage algorithms. This makes it a promising solution for cost-effective autonomous vehicle platooning. A complete real-world vehicle experiment is available at https://youtu.be/zL1bcVb9kqQ.
- Asia > China > Shanghai > Shanghai (0.06)
- North America > United States > California (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- (2 more...)
- Transportation > Ground > Road (1.00)
- Energy (1.00)
- Transportation > Passenger (0.93)
1 Hosting Licensing and Maintenance Plan
The dataset will be available for a minimum of five years, with no plans for removal. We will ensure ongoing maintenance to verify and maintain data accessibility. For what purpose was the dataset created? Was there a specific task in mind? Who created the dataset (e.g., which team, research group) and on behalf of which Who funded the creation of the dataset?
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > California > Los Angeles County > Los Angeles (0.05)
- Law (0.68)
- Information Technology > Security & Privacy (0.68)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > California (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Beijing > Beijing (0.04)